The Dissecting Power of Regular Languages 
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Abstract: A study on structural properties of regular and context-free languages has promoted 
our basic understandings of the complex behaviors of those languages. We continue the study 
to examine how regular languages behave when they are "almost halving" numerous infinite 
languages. In particular, we are focused on a situation in which a regular language "dissects" a 
target infinite language into two infinite subsets. Every context-free language and its complement 
can be dissected by carefully chosen regular languages. By expanding the scope of our study, 
we show that constantly-growing languages and semi-linear languages are also dissectable; how- 
ever, their complements as well as intersections are not. Under certain natural conditions, the 
complements and finite intersections of semi-linear languages become dissectable. Similarly, re- 
stricted to bounded languages, the intersections of finitely many bounded context-free languages 
and, more surprisingly, the entire Boolean hierarchy over bounded context-free languages are 
dissectable. As an immediate application, we show a structural property in which an appropriate 
bounded context-free language can separate, with infinite margins, two given infinite bounded 
' context-free languages, one of which contains the other with an infinite margin. This property is 
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closely related to a notion and result of Demaratzki, Shallit, and Yu (2001). 
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1 Background Knowledge and Results' Overview 
hJ. . . 

j-y ^ , Since the notion of context-free language was conceived and formulated as a mathematical model of natural 

languages by Chomsky [3J [3] in the 1950s, it has remained an intriguing research subject for almost six 
Q I decades both in theory and in practice. In formal language theory, context-free languages have been of great 

importance in, for instance, parsing programming languages since their introduction. In an early stage of the 
study of context-free languages, a useful "structural" property, known as semi-linearity, was discovered in 
^ ■ |10| , and another useful property, dubbed later as a pumping lemma, was proven in [T] . The former property 

\ dictates a behavioral pattern of the times each symbol occurring inside each string of a given language, 

OO ■ whereas the latter indicates the existence of numerous sequences of constantly-growing strings inside the 

22 ! language. The underlying structures of regular languages, in contrast, have been widely understood by 

■ a number of different frameworks, including the Myhill-Nerode theorem, monadic second-order logic, and 
04 I finitely generated monoids. 



Recently, new realms of structural properties that highlight the context-freeness of languages have been 
developed in an obvious connection to structural complexity issues of polynomial time-bounded complexity 
classes. For instance, the notions of immunity as well as pseudorandomness were introduced into context-free 
languages in |14j . The notion of minimal cover was also applied to regular languages in 0] . These properties 
■ have left unsolved numerous problems, concerning the structural properties of regular and context-free 

^ I languages, which, we suspect, might have rooted in certain unknown natures of the languages. To promote 

our understanding of regular and context-free languages, it must be desirable to unearth those hidden natures. 
In this line of study, this paper aims at exploring another natural structural property, which we fondly name 
"dissectability." This property, however, is most interesting for weak computation. One reason is that, 
for instance, polynomial-time computable langauges are too powerful to dissect easily any "computable" 
langauge of infinite size. 

Normally, regular languages are considered to be weak in recognition power; however, for certain simple 
tasks, they can exhibit surprisingly high power. One of such tasks is to "dissect" infinite languages in 
certain obvious ways. As we will give an example shortly, even computationally-hard infinite languages can 
be dissected into "almost halves" of infinite sizes using only the power of regular languages. More precisely, 
an infinite set C is said to dissect a target infinite set L, as illustrated in FiglTJ if two disjoint sets C DL and 
C n L are both infinite, where C is the complement of C. Seemingly, such dissection is one of the simplest 
actions to exercise when we try to analyze a basic structure of a target set. When every infinite set in a 
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language family C is dissected by regular languages, we succinctly say that C is KEiG-dissectable. As a quick 
example, let us consider a language L generated by a grammar whose productions include a special form 
S — > SS, where S is the start symbol. Although this language L could be quite hard in complexity, it can 
be easily dissected by a regular language composed of strings of lengths that are equal to zero modulo 3. 
This dissectability is explained by a fact that L contains a series of strings of lengths 2k, 2^fc, 2^k, ... for 
a certain fixed constant k > 0. 

A typical example of REG-dissectable language is context-free languages. Through Sections [3] to O two 
wider families of languages are also discussed. Constantly-growing languages and semi-linear languages are 
naturally dissected by regular languages. Under certain conditions, the complements, the intersections, and 
the differences of semi-linear languages are also REG-dissectable using elaborate analyses of length patterns 
of strings inside a given language. The analyses involve a manipulation of solutions of "semi-linear" equations. 
Those conditions are shown to be necessary to guarantee the REG-dissectability. On the contrary, a rather 
obvious limitation exists for the REG-dissectability; namely, as shown in Section [31 there is a logarithmic- 
space computable language that cannot be dissected by any regular language. Taking a step further forward, 
we will show that the class of the complements of context-free languages is REG-dissectable, essentially by 
an application of the aforementioned pumping lemma. More surprisingly, when limited to bounded languages 
of Ginsburg and Spanier [5] , we can show that the intersections of finitely many context-free languages are 
dissected by appropriate regular languages. This REG-dissectability result signifies the power of regular 
languages, because the intersections of k bounded context-free languages for k > 1 form an infinite hierarchy 
within the class of context-sensitive languages |9]. Our result can be obtainable, together with a result 
from |7], by an argument that is analogous to the argument mentioned earlier for semi-linear languages. By 
elaborating our argument further, we will prove that the entire Boolean hierarchy over the class of bounded 
context-free languages is also REG-dissectable. These results will be presented in Section[S] One challenging 
open question is to prove that the Boolean hierarchy over context-free languages is truly REG-dissectable. 

The REG-dissectability notion has several connections to other notions. Earlier, Demaratzki, Shallit, 
and Yu '4' studied a notion of minimal cover, which means the "smallest" superset A of a given set B, 
where "smallest" means that there is no set between A and B with margins of infinite sizes. Motivated by 
their notion and results, we examine a structural property of separating two infinite nested languages with 
infinite margins. In our term of "separation with infinite margins" (or i-separation, in short), we mean, as 
illustrated in Figl^l that a pair {B,A) of infinite sets, where A "covers with an infinite margin" (or i-covers, 
in short) B, can be separated by a single set C that lies in between the two sets with infinite margins. As an 
immediate application of the aforementioned REG-dissectability results for bounded context-free languages, 
we will show in Section [7] that two bounded context-free languages can be i-separated by bounded context- 
free languages in the above sense. This i-separation result can be further extended into any level of the 
Boolean hierarchy over bounded context-free languages. 

2 Notions and Notations 

We briefiy explain a set of basic notions and notations used in the subsequent sections. We denote by N the 
set of all natural numbers (i.e., nonnegative integers). For brevity, we set N"*" to be N — {0}. Associated with 
three arbitrary numbers a, 6, fc £ N, we define Aa^b,k as the set {an + b \ n d N,n > k}. For any countable 
set A, the succinct notation \A\ = oo (resp., \A\ < oo) indicates that A is an infinite (resp., a finite) set. 
Moreover, for two countable sets A and B, we write A C^g B to mean |A — _B| < oo, and we use the notation 
A B whenever A C^ie B and B Qae A hold. 

We usually denote by E an alphabet (i.e., a non-empty finite set) and, for a string x whose symbols are 
chosen from S, we write to denote the length of x (i.e., the number of occurrences of all symbols in x). 




Figure 1: C dissects L. Figure 2: C i-separates {B,A). 
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The empty string is always denoted A and the length |A| is zero. The notation S* denotes the set of all 
strings over S; in contrast, S"*" expresses the set E* — {A}. A language over E is a subset of E*. For a string 
w, denotes the string w in reverse; in addition, for a language L, denotes the set {w^ \ w & L}. The 
concatenation of two strings x and y is denoted xy. For any string a; and any symbol fi, the notation #(t(x) 
stands for the number of the occurrences of a in x. For any language S, the length set of S, denoted LT{S), 
is the collection of all lengths |x| for any string x in S. 

For two arbitrary languages A and B over the same alphabet E, the difference between A and B, 
denoted A ~ B, is the set {a; 6 E* \ x G A,x ^ B}. The complement of -B is the set E* — A and it is 
denoted B as far as its underlying alphabet E is clear from the context. A language is co-infinite if its 
complement is infinite. For ease of our notations, we use the following four class operations (see, e.g., |8]): 
(1) CAV = {CnD_\ C eC,DGV}, (2) CW = {CUD \ C gC,D eV}, (3) C-V = {C-D \AeC,DeV}, 
and (4) co-C = {C \ C € C}, where C and V are language families. 

For convenience, we write REG and CFL to denote the sets of all regular languages and of all context-free 
languages, respectively. The language family CFL(fc) (the k-conjunctive closure of CFL [131 114) ) is defined 
inductively as follows: CFL(l) = CFL and CFL(fc) = CFL(fc - 1) A CFL for k > 2. Liu and Weiner [9] 
showed that {CFL(fc) | k G N+} forms an infinite hierarchy. The Boolean hierarchy over CFL is defined 
as follows: CFLi = CFL, CFLzfc = CFLzfc-i A co-CFL, and CFLsfe+i = CFLa^ V CFL for every k e N+. 
Define CFLbh = Ufc>i CFLfe. Note that CFLfc C CFLfc+i for any index A; e N+. Obviously, it holds that 
CFLzfc = CFL2fc_i - CFL. Since CFL2 coincides with CFL A co-CFL, it holds that CFL U co-CFL C CFL2. 

To introduce a notion of (deterministic) advice that is fed to finite automata beside input strings, we 
adopt the "track" notation of TT]. For two symbols cr g E and r G F, the notation [ %] expresses a new 
symbol made up of a and r. On the input tape, this new symbol is written in a single tape cell, which is split 
into two tracks, whose upper track contains a and the lower one contains r. Notice that an automaton's 
tape head scans two track symbols a and r in [%\ at once. For two strings x and y of the same length n, [ y] 
denotes a concatenated string [ ^2] ' ' ' [ y^]? provided that x = X1X2 ■ ■ ■ Xn and y = yiy2 ■ ■ ■ yn- An advice 
function is a function mapping N to F*, where F is an alphabet, called an advice alphabet. The advised 
language family REG/n of Tadaki et al. [TT] is the collection of all languages L over certain alphabets E 
such that there exist a Idfa M, an advice alphabet F, and an advice function /i : N — F* for which (i) for 
every length rt G N, |/i(?^)| = n and (ii) for every string a; G E*, a; G L iff M accepts [h(fa;|)]- Similarly, CFL/71 
was defined in [TBi . 

Finally, we introduce a notion of "immunity." Let T be any family of languages. A language S is sais to 
be !F -immune if S is infinite and 5* has no infinite subset belonging to J- (see, e.g., |14p. 

3 How to Dissect Languages 

Let us recall from Section [T] the notion of REG-dissectability. More generally, for any non-empty language 
family C, we say that an infinite language S is C-dissectable if there exists a language C in C that dissects S 
(i.e., |C n 5*1 = |C n 5*1 = 00). A non-empty language family F is said to be C-dissectable if every infinite 
language in J- is C-dissectable. Notice that this definition disregards all finite languages inside J-, and thus 
we implicitly assume that J- always contains infinite languages. 

The choice of C in the definition of C-dissectability is of great importance. In particular, low-complexity 
languages are most interesting for dissectability. One reason is that high-complexity languages are too 
powerful to dissect most infinite languages. To see this fact, we will present two simple examples. In the 
first example, we consider the class P of all languages recognized by multi-tape Turing machines running 
in polynomial time. With the power of languages in P, we can dissect recursive languages of infinite size. 
Notationally, for a set S, we write S'(a;) — (resp., S{x) = 1) to mean that x & S (resp., x ^ S). 

Example 3.1 We claim that every infinite recursive language is P-dissectable. Let L be any infinite lan- 
guage, over an alphabet E, recognized by a single-tape Turing machine M that eventually halts on all inputs. 
For simplicity, let E = {0,1} and assume that L S* because, otherwise, the set C — {Ox | a; G E*} 
easily dissects L. Now, we define C as follows. Let zq, zi, Z2, ■ ■ ■ be a standard lexicographic order of all 
strings. For each string x, we go through the following procedure from round to round |a:|. Initially, we set 
A = R = 0. At round i, we first recover the value C{zi) by following the defining process of C{zi). We then 
simulate M on the input Zi within |a;| steps. Assume that M{zi) — 1. Update A to be A U {«} if C{zi) = 1; 
let Rhe RU {i} if C{zi) = 0. Whenever either M{zi) = or M{zi) is not obtained within |a:| steps, we do 
nothing. After round |a;|, if |^| > \R\, then define C{x) — 0; otherwise, define C{x) = 1. Clearly, C is in P. 
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By a diagonalization argument, we can show that |CnL| = |CnL| = cxd. Therefore, every infinite recursive 
language can be dissected by a certain language in P. 

In the second example, we will show that a simple use of advice makes it possible to dissect an arbitrary 
language even by regular languages. 

Example 3.2 We claim that every language is REG/n-dissectable. To show this claim, take any infinite 
language S over an alphabet S. Since S is infinite, the length set LT{S) is also infinite. Hence, we partition 
LT{S) into two infinite subsets, say, and S2] that is, 81^82 = 0, LT{8) 81^82, and IS*!] = |S'2| = 00. 
We also assume that ^ 5*1. Now, we define an advice function ft, : N {0, 1}* as follows: let h{n) = 10""^ 
if n S S*! and h{n) = 0" otherwise. We also define a dfa M as follows: on input [ y], if y = 10"~^, then 
M accepts the input; otherwise, it rejects the input. Define C — {x \ M accepts [/i(fa;|)]}, which belongs to 
REG/n. Obviously, for any x £ 8 with |a;| G 6*1, since h{\x\) = lOl"^'"-^, M accepts [;i(fa:|)]. It thus holds 
that |C n S"! — 00. Similarly, for any x G 8 with |x| S 82, M rejects [ ^(fa;!)]. Thus, |C n 5*1 = 00 holds. In 
conclusion, C dissects 8. 

In the rest of this paper, we will focus our attention to the case of REG-dissectability. A pattern of the 
lengths of strings in a target language plays a key role in the REG-dissectability. We turn our attention to 
particular languages whose strings satisfy a certain length condition, known as a "constant growth property." 
Formally, a language L is said to be constantly growing if there exists a constant p > and a finite subset 
K C N"*' that satisfy the following condition: for every string x € L with |a:| > p, there exist a string y £ L 
and a constant c G K for which \x\ = \y \ + c holds. Such a language can be easily dissected by regular 
languages as shown below. 

Lemma 3.3 Every constantly- growing language is HEG -dissectable. 

Proof. Let L be any language over an alphabet E. Now, we assume that L is constantly growing with 
a constant p and a finite set K. Let c be the maximal element in K. For each index i E [c], we define a 
language Li = {x G L \ \x\ = i (mod c + 1)}. We want to claim that there are at least two distinct indices 
ii,i2 G [c] such that jL^J = {Li^l — 00. Assume otherwise. Since L = Uie[c] least one of L^'s is 

infinite. Our assumption implies that exactly one index i G [c] makes Li infinite. We fix such an index. For 
each constant j S [c], define 8ij = {y £ L \ 3x £ Li [\x\ — \y\ + j]}- Since L is constantly growing, the set 
8ij is infinite for a certain index j. This implies that L^+j mod c+i is infinite because 8ij C Li_(_j mod c+i- 
This contradicts the uniqueness of i. Therefore, there are at least two distinct indices ii,i2 G [c] such that 
\ = \LiJ = 00. _ 
We define C — {x £ Y.* \ \x\ = ii (mod c + 1)}. Clearly, C is regular. Moreover, Li-^ C C and Li^ C C. 
This implies that |C n L| = |C fl L| = 00. In other words, C dissects L, as required. □ 

The property of constant growth is not sufficient for the REG-dissectability. For example, the language 
exemplified in Section [T] may not be constantly growing; however, it is REG-dissectable. For a wider 
application, it is therefore desirable to strengthen Lemma 13.31 slightlv. In what follows, we succinctly write 
CGL for the family of all constantly-growing languages and use the notion of CGL-immunity. 

Proposition 3.4 Every language that is not CGh-immune is KEG-dissectable. 

This proposition follows from Lemma 13.31 and the next trivial lemma. The latter lemma is also useful in 
proving certain closure properties in Section 21 

Lemma 3.5 For any two infinite languages A and B, if A is KEG-dissectable and A C B, then B is also 
KFiG-dissectable. 

Proof. This is trivial because any language that dissects A can dissect B whenever i? is a superset of A. 
□ 

By contrast, we will show an obvious limitation of the REG-dissectability. Following a convention, the 
notation L stands for the family of all languages that can be recognized by two-way deterministic Turing 
machines using a read-only input tape together with a fixed number of logarithmic space-bounded read/ write 
work tapes. In the next proposition, we show that L contains a language that cannot be REG-dissectable. 
This result shows a clear limitation of the dissecting power of regular languages. 
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Proposition 3.6 The language family L is not HEG-dissectable. 

The proof of Proposition [3^ requires the following technical property of unary regular languages. Recall 
the notation Aa.b,k and, in addition, set Q ^ {{a,b, k) \ a,b, k E N,b < a} for the description of the property. 

Lemma 3.7 For any unary language S, S is regular iff there exists a finite set G ^ Q for which LT[S) = 

U(a,fc,fc)eG ^a,b,k- 

Proof. Let S be any language over E — {0}. 

(If-part) Let G be any finite subset of G and assume that LT{S) — IJ^^ ^ k)eG ^a,b,k- For brevity, we write 
Sa,b,k for the set {0" | n G Aa,b,k}- It thus holds that Sa,b,k = {0" \ n = ai + b,i > k} = {0'^'=+''(0")* | i G N}. 
Clearly, Sa,b,k is regular because a, b, k are all constants. Since G is finite and S — |J(.^ fc)eG ^a.,b,k, S is also 
regular. 

(Only If-part) Since S G REG, by [H Lemma 2], there exist two integers d > and a > 1 and two 
sets A C {0' \ < i < d} smd B C {0' \ d < i < a + d} such that S ^ A + B{0"-)*. Note that LT{S) 
equals the union LT{A) U {an + j \ j G LT{B),n > 0}. Since Aq^^.q = {«} and Aajfi = {an + j \ n > 0}, 
it suffices to define G = {(0,i,0) | i G LT{A)} U {(a,j,0) | j G LT{B)} for the desired equality that 
mS) = Uia.b,k)^GAaAk. □ 

Now, we give the proof of Proposition l3.6l 



Proof of Proposition 13.61 Consider the unary language S = {0"' | n G N} over the alphabet S = {0}. 
First, we want to show the following claim. 

Claim 1 S is in L 

Proof. It suffices to design a log-space Turing machine that recognizes L. On input of the form 0™, the 
desired machine writes m in binary on its 1st work tape and 1 on its 2nd work tape using O(logTO) cells. At 
each round, it reads out a number, say, n in binary on the 2nd tape and check if m is a multiple of n using 
the 3rd work tape as a counter up to n. If not, then the machine immediately rejects the input. Otherwise, 
it increases n by one (in binary) before entering the next round. If the machine does not rejects until n 
reaches m, it accepts the input. □ 

Next, we want to show that no regular language can dissect S. Assume otherwise; that is, there exists 
an infinite language C G REG over E dissects S. Lemma [3.71 guarantees the existence of a finite set G for 
which LT{C) — IJ^^ ^ ^.j^q Aa^b.k- Without loss of generality, we can assume that 6 < a for any (a, 6, k) G G. 

Since |C n 5*1 = oo, there exists a triplet {a,b,k) G G such that \{m \ 3n > k[m\ — an + b]}\ ~ oo. 
Here, we claim that b — 0. If an + b — ml for a certain large integer m > a, then m! = (mod a). Since 
an + b = b (mod a), it follows that 5 = (mod a). Since b < a, b must be zero, as required. Moreover, we 
claim that a > 1. If a = 1, then Ai^b.k equals {n + b \ n > fc}, which coincides with {n \ n > k + b}. Thus, 
|^i,b,fc| < oo, and therefore \LT{S) LT{C)\ < oo, a contradiction against |C n S*] = oo. 

Since a > 1 and 6 = 0, for a certain large constant fc', it holds that {ml \ m > k'} C Aa.o.k- This implies 
that \LT{S) n LT{C)\ < oo, a contradiction. □ 



4 Basic Closure Properties of REG-Dissectability 

Before proceeding on a further exploration of the REG-dissectability of other languages, we quickly examine 
basic closure properties of the set of infinite REG-dissectable languages. For readability, we use the notation 
REG-DISSECT for the collection of all infinite languages that are REG-dissectable. Although this family 
REG-DISSECT is related to REG, it embodies clear traits that are quite different from those of REG. 
We begin with a simple observation. 

Lemma 4-1 The set REG-DISSECT is closed under concatenation, reversal, Kleene star, and union. 

Proof. Let L, Li,L2 be any three languages in REG-DISSECT. [Union] Note that Li C Li L) L2. Since 
Li is REG-dissectable, Lemma 13.51 implies that Li U L2 is also REG-dissectable. [Reversal] For L, take 
an infinite regular language G that dissects L. Consider the reversal C^. Obviously, this C^ dissects . 
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[Concatenation] Notice that the concatenation of Li and L2 is L1L2 = {xy \ x G Li,y G i2}- Since 
Li C L1L2, by Lemma [3.51 the REG-dissectabihty of Li leads to the REG-dissectabihty of LiL2- [Kleene 
star] Note that the Kleene star L* = lJj>o contains L as a subset. Apply Lemma 13.51 to obtain the 
REG-dissectability of L*. " □ 

Despite REG-DISSECT satisfies the closure properties listed in Lemma [4.11 it cannot be closed under 
intersection. More strongly, REG-DISSECT is not closed under intersection even with regular languages. 
This claim will be shown below. 

Lemma 4-2 REG-DISSECT is not closed under intersection with regular languages. 

Proof. Let S = {0, 1} be our alphabet. Consider the set D = {0"' | n > 1}. As we have shown earlier, 
D is not REG-dissectable. Now, we define two sets A = {0}* and B — DU {!}*. It is easy to dissect A and 
B by regular sets Ca = {0^™ | m > 0} and Cb = {1^™ | m > 0}. Hence, A and B are REG-dissectable. 
However, since A n B = D, A n B is not REG-dissectable. Hence, REG-DISSECT is not closed under 
intersection with regular languages. □ 

We will show two more non-closure properties of REG-DISSECT. For any alphabet S, a homomorphism 
/ is a map from E to S*. The domain of / can be further expanded from S to the whole set E* by 
defining /(A) = A and f{x(j) = f{x)f{a) for any a; e E* and e E. Finally, set f{L) = Uxgl/I^^)- A 
homomorphism / is called X-free if /(tr) ^ A for every cr £ E. We say that a language family J- is closed 
under X-free homomorphism if, for every language L G T and every A-free homomorphism /, f{L) also 
belongs to J-. Moreover, for two languages L and L' over the same alphabet, the quotient L/L' is the set 
{x \ 3y L'[xy d L]}. We say that J- is closed under quotient with regular languages if, for every set L ^ J- 
and every regular language L' , the quotient L/L' is also in J-. 

Lemma 4-3 REG-DISSECT is not closed under X-free homomorphism as well as quotient with regular 
languages. 

Proof. (1) For the non-closure property under A-free homomorphism, we define L — {!"' | n G N} U {0"' | 
n G N}, which belongs to REG-DISSECT. Moreover, we define h{0) = h{l) = 0. Clearly, h is a A-free 
homomorphism. The image set h{L) equals {0"' | n G N}, which can be proven to be non- REG-dissectable 
by an argument similar to the proof of Proposition 13.61 

(2) Next, we consider the non-closure property under quotient. We define L = {0"T"' | n G N}U{1" 0"' | 
n G N} and L' = {!}'''. Obviously, L' is reg ular. Note that the quotient L/L' = {0"' | n G N}. As in (1), 
L/L' cannot be REG-dissectable. □ 



5 Semi-Linear Languages and REG-Dissectability 

Semi-linear languages are described by the behaviors of the number of occurrences of symbols in strings. This 
characteristic naturally makes those languages REG-dissectable. Under certain conditions, the complements 
as well as intersections of semi-linear languages are also dissected by regular languages. By stark contrast, 
without those conditions, they are no longer REG-dissectable in general. 

5.1 Semi-Linear Languages 

Parikh [TU] discovered that the times of symbols occurring in each string in a context-free language L must 
satisfy some of certain linear Diophantine equations. This result inspires us to consider languages defined 
by those linear equations. Here, we introduce a notion of "semi-linear" languages by the following matrix 
formalism. 

Firstly, we say that a subset A of N'^ is linear if there exist a number m G N and an (m -I- 1) x fc non- 
negative integer matrix (called a critical matrix) T satisfying: for every point v G N*^, w G A iff an equation 
(1, 2i, Z2, . . . , Zm)T = V holds for a certain tuple (called a solution) (zi, Z2, . . . , Zm) G N™. Equivalently, a 
linear set is a coset of a finitely generated sub-semigroup of N'^' for a certain fc G N. A semi-linear set is a 
union of finitely many linear sets. Note that the set of semi-linear subsets of N'^ is closed under Boolean 
operations [6 . Secondly, we expand the notion of semi- linearity into languages. Let E — {cti, (72, . . . , Ufc} be 
an alphabet for L. For any string x, a point {ifai {x), #(J2 (x), . . . , {x)) in the space N*^ is denoted ^'(a;) 
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and called Parikh's image of x. The commutative image (or Parikh's image) ^(i) of L is the collection of 
all Parikh's images of strings in L. We say that the language L is semi-linear if ^'(L) is semi-linear. Notice 
that, since we are interested only in infinite languages, we always restrict our attention on the case of Tj ^ O 
and implicitly assume that Tj ^ O in the rest of this section. The notation SEMILIN denotes the set of all 
semi-linear languages. 

We note that every semi-linear language L is constantly growing. This fact can be shown as follows. 
Actually, we need to discuss only the case where ^{L) is linear. Take an (m + 1) x fc critical matrix 
T = {di,j)ij for L. For each index i G [m], let e = Xlj=i ^jj- 1^°^' simplicity, assume that ei ^ for all i's. 
Let uq denote ei + 1 and consider all strings w G L with \w\ > uq. We define K to be the set of all numbers 
between 1 and max{ei, 62, ... , e™}. Let "^'{w) = (wi, . . . , Vk) and let (zi, Z2, ■ . ■ , Zm) be any solution for the 
equation v — (1, zi, Z2, . . . , Zm)T. Since \w\ = ei + ^i+i^i > ei, not all z^'s are zeros. Choose an index 

io for which Zi„ ^ 0, and define z^'^ = Zi„ — \ and z^ = Zj for any other j's. Finally, we take a string x G L 
satisfying that = eo -I- J2iLi ^i+i^i- Clearly, there exists a constant c G K for which \w\ = \x\ + c. 

Since semi-linear languages have the constant growth property, Lemma [3.3l therefore leads to the following 
consequence. 

Lemma 5.1 The language family SEMILIN is HEG-dissectable. 
5.2 Finite Intersections of Semi-Linear Languages 

Since REG-DISSECT is closed under union. Lemma l5.ll implies that, for any two languages Li,L2 G 
SEMILIN, if Li U L2 is infinite, then Li U L2 is REG-dissectable. Next, let us consider a question of 
whether the intersection of finitely many semi-linear languages is REG-dissectable. Under a certain condi- 
tion, it is possible to prove that this is indeed the case. For readability, we first focus on the intersection of 
two semi- linear languages. 

Lemma 5.2 For any two semi-linear languages Li and L2, if Li D L2 is infinite and ^'(ii) H ^'(^2) ^ 
^'(Li n L2), then Li H L2 is KEiG-dissectable. 

Proof. Let Li and L2 be any two semi-linear languages over a fc-letter alphabet E, say, {ai, a2, ■ ■ ■ , Cfe}. 
Assume that the intersection Li n L2 is infinite and that ^(Li) fl ^'(^2) ^ ^(-^1 H L2). Hereafter, we aim 
at proving that Li fl L2 can be dissected by a certain regular language. 

Consider any partition of Li (resp., L2) as Li = Uili (resp., L2 — Ui=i -^0 using languages 
Ai, A2, ■ . ■ , As-^ (resp., Bi, B2, ■ ■ ■ , Bs.2) whose commutative images are linear sets. It also holds that ^(Li) n 
*(i'2) = Ui<i<ai Ui<j<s2(*(^*) n *(5j))- Since ^{Li) D ^{12) is infinite, there exists a pair (ji,j2) G 
[si] X [52] that makes ^{Aj^) n "^{Bj.^) infinite. Fix such a pair in the following argument. 

By [9J Theorem 6] , the intersection of finitely many linear sets can be expressed simply by an appropriate 
semi-linear set. Hence, there is a series of languages Di, D2, ■ • ■ , such that '^{Aj^)r\^{Bj^) — Uj=i '^{^i) 
and all 5' (Dj)'s are linear sets. Now, we choose an index j G [s] forwhich |^'(Z3j)| = 00, and take an (m+l)x/c 
critical matrix T — {dix)ij, for Dj. 

For any point w = (wi, . . . , V}~) in 'ii{Dj), each element vi can be expressed as vi = rfi/ + '^i+'i-A^i 
for a certain tuple (zi, . . . , z^) G N™. We choose an index £ for which X^I^i ^j+i.f 7^ 0. Obviously, such 
an index exists because, otherwise, di+i^g = for all i G [m] and thus ^i{Dj) is finite, a contradiction. In 
what follows, we fix this index £. For convenience, set e — di+i^ and write d[ for di/. Now, we define 

Co = {x G E* I {x) - Y2^i rf- = (mod 2e)} and Ci^{xG^*\ {x) - E"T' rf- = e (mod 2e)}. For 
each string x G Cr, there exists a number w G N such that ifajx) — Yl7=i^ — 2eu + er, where r G {0, 1}. 
This is equivalent to #0-^(0:) = d'l -\- J2™^i ^i+i(2M + r). Since (2u, . . . , 2u) and (2u + 1, . . . , 2u + 1) are 
legitimate choices of (zi, . . . , Zm) for \E'(£'j), they generate two different points, say, vq and vi in ^'(Z3j). 
Since ^'(-Dj) ^ ^'(^1) n ^'(^2) C ^'(Li n L2) by our assumption, two corresponding strings, say, xq and xi 
whose Parikh's images are respectively -Do and vi belong to Li n L2- Note that, for each r G {0, 1}, Xr also 
belongs to Cr, and thus it is in Cr H (Li D L2). Since u is arbitrary, it follows that |C,- fl {Li n £2)! = 00. 
Since Cq n Ci = 0, Co dissects Li n L2. □ 

The argument used in the above proof can be easily extended from the intersection of two sets ^{Ai) 
and ^'(i?j) to the intersection of an arbitrary number of sets. Therefore, we finally obtain the desired result 
stated below. 
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Proposition 5.3 Let k be any number > 2. Let Li, L2, ■ ■ ■ , Lk be k semi-linear languages. If f]i=i is 
infinite and HiLi ^(-^i) ^ ^(fliLi ^i); then HiLi -^i JiEG-dissectable. 

Without the condition □^'(£2) C '^{Li nL2) in Lemma [5T^ we cannot prove that the intersection 

of two scmi-hnear languages is REG-dissectable. More precisely, let SEMILIN(2) be the language family 
SEMILINASEMILIN. To see that SEMILIN(2) is not REG-dissectable, let us consider the following example. 
Let Li ^ {0"1" I n e N} and L2 = {1"0" | n e N} U {0"'l"' | n e N}. Since = ^(La) = | n e 

N}, Li and L2 are in SEMILIN. However, the intersection Li n L2 = {0"'l"' | n e N}, which belongs to 
SEMILIN(2), can be shown to be non-REG-dissectable by an argument similar to the proof of Proposition 

5.3 Complements and Differences of Semi-Linear Languages 

Next, let us consider the complements of semi-linear languages. Unfortunately, the family co-SEMILIN is 
not REG-dissectable. This is easily seen as follows. Let L = {0"1"' | n G N} be a language over E = {0, 1}. 
Since \E'(L) = N^, T is in SEMILIN; thus, L belongs to co-SEMILIN. As noted in the previous subsection, 
L is not REG-dissectable. 

However, under an appropriate condition, the complements of semi-linear languages are proven to be 
REG-dissectable. 

Lemma 5.4 Let L be any co-infinite semi-linear language over an alphabet E = {iJi, . . . , ct^}. // ^(i) ^ae 
N'' , then the complement of L is REG-dissectable. 

Proof. Let L e SEMILIN be any co-infinite language over an alphabet S = {(Ti,cr2, ■ • ■ ,crfc}- We first 
partition L into Ai, A2, . . . , Ag whose commutative images are linear sets. Clearly, it holds that N'^ — 
^(L) = fX^ii^'' — '^{Ai)). For each index j e [s], take an (to -I- 1) x fc critical matrix Tj for Aj and let 
Tj = {d'fj),j. Since ^ae N'=, ^ae N''' follows for each index i e [s\. Note that v <En^ - 

iff u ^ (1, zi, . . . , Zm)Tj holds for all tuples (zi, . . . , Zm) G N™. 

Here, we introduce new notations t'^^~'' and Vi. For indices £ G [to] and j G [s], the notation T^^"-* denotes 
the matrix obtained from Tj by deleting the ith column, and Vi denotes the set of all points v G N'^"^ that 
satisfy the following condition: for every index j G [s] and for every tuple (zi, . . . , z„j) G N™, it holds that 

V ^ (l,zi, . . . , Zm)Tj^~^ . Moreover, for convenience, let V/- stand for the set N'^^^ — Vi. 

(1) Assume that Vi is infinite for a certain index £ G [k]. Fix such an index £ and choose an arbitrary 
point V ~ (wi, . . . , vi^i, vi+i, . . . ,Vk) in V£. By the definition oi V£, it follows that, for any number d G N, a 
point v^'^^ = (wi, . . . , ve^i,d, Vi+i, . . . , Vk) induced from v always belongs to N'^ — '^(Aj) for all indices j G [s]. 
Take a string Wd whose Parikh's image is exactly v^'^\ In particular, we have ^aiiwd) = d. It holds that 
Wd G L, because Wd & L implies w^'*' = ^'(wd) G ^(i), a contradiction. Since d is arbitrary, it suffices to 
define a regular set C a,s C = {x \ #0-^(2;) = (mod 2)}. We wish to show that |C fl L| = |C fl L| = 00. 
When d is even, :^„^[wd) = d implies ^aiiwd) = (mod 2); thus, Wd G C. This yields the membership 

G C n L. Similarly, when d is odd, we obtain #0-4 (wd) = 1 (mod 2), implying Wd G C. Hence, Wd should 
be in C n L. Since d is arbitrary, it follows that |C fl Lj = |C n L| = 00. 

(2) Assume that all V^'s are finite. In particular, should be infinite. Now, we fix an arbitrary point 

V = {v2,V3, . . . ,Vk) in and consider the set B of all possible choices (j, zi, . . . , z™) G [s] x N™ that satisfy 
the equation v — (1, zi, . . . , Zm)Tj ^ K Since there are only a constant number of such choices, B should be 
finite. Next, we define a set D of integers as I? = {e G N | e = (1, zi, . . . , Zm)Tj[^], (j, zi,. . . , z^) G [s] x N™}, 
where Tj[l] denotes the the first column vector of T,-. Obviously, this set D is finite. We wish to claim that, 
for every number e' G N — D, a point v' — (e', V2, . ■ ■ , Vk) falls into the set N*^' — ^'(i). To show this claim, we 
assume otherwise. There exists a particular choice (j', zj, . . . , z^J satisfying v' = (1, z[, . . . , z'^-^)Tj>, which 

implies w = (1, z^ , . . . , z'^)Tj} \ In other words, (/, Zj, . . . , z^) belongs to B; thus, we conclude that e' G D, 
a contradiction. Similar to (1), we define C ^ {x \ #0-1 (a;) = (mod 2)}. It is not difficult to show that 

|cnl| = |cnl| = 00. □ 

Inspired by the arguments used in the proofs of Lemmas 15.21 and 15.41 we further prove that, under a 
certain condition described in the following proposition, the difference between two semi-linear languages is 
REG-dissectable as long as the difference forms an infinite set. 
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Proposition 5.5 Let Li and L2 be any two infinite semi-linear languages satisfying '^{Li) (^ae ^'(^2)- If 
^'(Li) — ^'(^2) ^ ^(-^1 — L2) holds, then the difference Li — L2 is HEG-dissectable. 

Proof. Let Li and L2 be infinite languages in SEMILIN over S, where S — {cti, . . . , crfe}. Consider 
a partition Ai, A2, . . . , Ag-^ of Li so that ^'(Li) = Uili ^(^0 ^(^i)'s are Hnear sets. Note that 

^'(Li)-«'(L2) = Uiii(*(^0-*(-^2)). Since REG-DISSECT is closed under union by LemmaOl it suffices 
to focus on the difference ^{Ai) — ^'(L2)- For notational simplicity, we henceforth assume that Li — Ai. 

Take a critical matrix T for Li and s critical matrices Si, S2, . . . , Ss for L2, where s > 1. Let T = {di^i)ij, 
and Sj = [e^il)i,i for each index j e [s\. Using mi + m2 variables zi, . . . , ZmnWi, . . . ,Wm2 over N, for each 
fixed index j G [s], we consider a matrix equation (1, zi, . . . , Zmi)T = (1, t^i, . . . , Wm2)Sj, which is equivalent 

to a set of k linear Diophantine equations: di^q + YIh=^i di+i^qZi ~ e^"'^ + X^iSl g""^*' '^here q ranges 
over the index set [k]. If T satisfies that, for a certain index i, = for all q S [fc], then a point 

(1, zi, . . . , Zmi)T does not depend on the choice of Zi. To keep our proof simple, we assume that T does not 
satisfy this property. 

Hereafter, we discuss the case where m2 < mi. For ease of notational complication, we assume that, in 
the above set of equations, wi,W2, ■ ■ ■ , Wm^ as well as z^+i, 2:r+2, • ■ • , (!<?■< m-i) are free variables 
and the remainders, zi, Z2, ■ • ■ , z^, are bound variables. In the case of mi < m2, similarly, we set r = mi. 
With a help of those free variables, each bound variable zi {£ G [r]) can be expressed in the form of linear 
polynomial, say, p'i'\wi, . . . , Wmi , z^+i , . . . , z™ ^ ) with rational coefficients. 

Now, we define a set Di for each index t G [r] as 

Di = {(zi,...,z<>_i,Z£+i,...,Zr) G N''"^ I Vj G [s] Vwi,...,Wto2 G N 

Vz^+i, . . . ,z„i G N 3i G [r] - {£} s.t. z^ ^ p'f^wi, . . . , w„2,Zr+i, . . . ,z„J}. 
In what follows, we will examine two cases separately. 

(1) Assume that Dg ^ holds for a certain index f G [r]. We fix such an index £ and choose an 
element (z^, . . . , z[__^, z^^^, . . . , z^) from Dg. For any number d G N, the notation v'^'^^ expresses the vector 
(1, zi, . . . , z[_-^,d, z'g_^_^, . . . , z^, 0, . . . , 0)T. By the definition of Dg, it follows that w^'^^ ^ (1, wi, . . . , Wm^)^] 
for any j G [s] and for any tuple (wi, . . . , Wt?x2) £ Therefore, falls into ^(ii) — ^(L2). By our 
assumption of ^'(ii) — 5'(i2) Q "^{Li — L2), this implies v^''^ G ^'(ii — L2); thus, Li — L2 contains a string 
a; for which ^(o;) = u^''-'. In particular, we choose 2u and 2u+ I as two candidates for d, where u represents 
a free variable, and we fix an index q G [k] satisfying ^ 0. Note that such q exists by our choice of T. 

Assume that of the form {vi , . . . , Vk) and, moreover, set d = X^KjXr di+i^gZ^. When d = 2u, we 

obtain Vd = di^q + d+2ude+i,q, and thus Vd — di^q — d = (mod 2di^i_q) holds. On the contrary, if d = 2u + l, 
then we obtain Vd — — d = d^+i.g (mod 2de^i^q). Hence, d — 2u and d = 2u + 1 produce two different 
equations modulo 2di+i^q. Since u is arbitrary, the set C = {.t G S* | ffa-gix) — di,g — d = (mod 2d^+i ,j)} 
dissects Li — i2- 

(2) Assume that Dg — for all £ e [r]. Choose a pair {£, q) G [r] x [k] satisfying that J2i<i<r i^i "^i+i.g 7^ 0- 
Such a pair actually exists because, otherwise, we obtain de+i,q = for all pairs (£, q) and this makes ^'(Li) 
finite, a contradiction. 

For simplicity, set d = X]i<i<r j^!^^ ^i+i,q 7^ 0- Our assumption implies the existence of a certain value z^ 
that satisfies the following condition: for every (wi, . . . , Wm^) G N"'^ and for every (zi, . . . , Z£_i, z^+i, . . . , z^J G 
j^mi-i^ if Zi =pp^(wi,...,w™2,Zr+i,...,z™J for alH ^ ^ then 7^ p^^^ (wi, . . . , , z^+i, . . . , z™ J. De- 
pending on a number d G N, we use the abbreviation u*^'') for the point (1, zi, . . . , Zmi)T, where we set and 
zg = z'g, Zi = d for any i G [r] — {£}, and Si = for all I's with r -I- 1 < i < mi. 

Now, we take 2u and 2w -I- 1 as two different values of d, where u is a free variable. Assume that 
of the form (ui, . . . , Vk). Similar to the argument in (1), d = 2u implies Vq — di.g + 2ud + d^+i^^z^, whereas 
d = 2ti + 1 implies Vq = di ^ + 27id + d + d^+i^gZ^. To obtain the desired dissection, it suffices to define 
C = {x\ ffag{x) — di^q — di+i^qz'f, = (uiod 2d)}. This set C dissects Li — L2. □ 

6 Context-Free Languages and Bounded Languages 

Context-free languages are an important example of semi- linear languages |T0]. A semi-linearity nature of 
context-free language will be fully exploited in certain cases of the REG-dissectability proofs later in this 
section. Meanwhile, we set our focal point at the REG-dissectability of CFL U co-CFL. 
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Proposition 6.1 The language family CFL U co-CFL is HEG-dissectable. 

Proof. (1) Since CFL C SEMILIN, it immediately follows from LemmaE3]that CFL is REG-dissectable. 

(2) Next, we wish to show that co-CFL is also REG-dissectable. Let L be any infinite language in co-CFL. 
Let S = {fTi, . . . ,(Tk} be an alphabet for L. We want to show that (*) there exists an infinite subset S of 
L that is constantly growing. This implies that L is not CGL-immune. Proposition 13.41 thus implies the 
REG-dissectability of L, as required. 

To show Statement (*), we need the following form of a pumping lemma for co-CFL, which is a direct 
consequence of a pumping lemmsQ for CFL, given in [I]. This lemma, however, holds only for infinite 
languages. For completeness, we include its proof. 

Lemma 6.2 [Pumping Lemma for co-CFL] Let L be any infinite language in co-CFL. There exists a 
constant p that satisfy the following: for every string w G L with \w\ > p, there exist strings u, f , y, z such 
that (i) 1 < \vy\ < p, (ii) w = uxz, and (Hi) w' — uvxyz is in L. 

Proof. If L is finite, then L =ae S* and the lemma is trivially true. Hence, we assume that L is infinite. 
Since L is in CFL, we apply the pumping lemma for CFL. Take a pumping constant p and let w be any 
string in L with \w\ > p. Consider a finite set = {uvxyz \ w — uxz, 1 < \vy\ < p} generated from w. 
It suffices to show that A^, 2 ^- Now, assume otherwise; that is, C L. We then apply the pumping 
lemma for CFL to every string r in A^- Since r € A^, there are strings u, v, x, y, z such that r = uvxyz and 
r' = uxz € L. Since 1 < \vy\ < p, for a certain string r, r' coincides with w. Thus, we conclude that W L, 
a contradiction. Therefore, A„ 2 ^ follows, as required. □ 

We return to the proof of Proposition 16. II Let us choose a pumping constant p given in Lemma [^21 This 
lemma produces an infinite sequence 5* = {wi,W2, . . .} in L such that, for every index i, = \wi\ + ci 

holds for a certain number c; G [p\. Clearly, S is constantly growing. This completes the proof. □ 

To utilize proof techniques developed for semi-linear languages in Section [SJ we focus our attention on 
a restricted part of context-free languages. A language L over an alphabet S is said to be bounded if there 
are fixed "non-empty" strings wi, W2, . ■ . , Wm in S* such that L is a subset of the set L[wi,W2, ■ ■ ■ , Wm] =def 
{w''iw'^2 ' ■ ' ^rrT I ^1: *2, ■ • • , *m G N} |6] . Bouuded languages have been frequently used in proofs of class 
separations: for instance, the separation between CFL(fc) and CFL(fc -I- 1) for every A; > 1 [9]. 

For readability, we denote by BCFL the family of all bounded context-free languages. Analogous to 
CFL(fc) and CFLfe, we can define BCFL(fc) and BCFL^ as well. Liu and Weiner [5] actually proved 
that the class {BCFL(fc) | k G N+} forms an infinite hierarchy within the class of context-sensitive lan- 
guages. Furthermore, we extend Parikh's images as follows: let the extended Parikh's image \['(?x;) be 
{(ii, 12, . . . , im) e N™ I w = w'-l w'2 ■ ■ ■ vJ^m^ foi" ^-ny w £ L\w\, . . . ,Wm\. Notice that ^{w) generally 
forms a "set" because w may have more than one expression of the form vf'^ w^^ ' ' ' ^J/T • Finally, we define 
^{^) — UweL ^(^) for every bounded language L. 

For bounded languages, ^ works as ^f. By extending a result of 7 , Ginsburg ^Sj presented a close 
relationship between bounded context-free languages L and the semi-linearity of XP. What we need here is a 
slightly weaker form of [51 Theorem 5.4.2], as stated below. 

Lemma 6.3 For any subset L of L[wi, . . . ,Wk], if L £ CFL, then ^(i) is semi-linear. 
Theorem 6.4 For any number k > 2, BCFL(fc) is KEG-dissectable. 

Proof. Let L' — L[wi,W2, ■ ■ ■ , Wm] and let Li, L2, ■ ■ ■ , Lk be any k subsets of L' in BCFL. Assume that 
L — ni=i is an infinite set. First, we want to claim the following. 

Claim 2 nti ^ *(nti 

Proof. Let v be any point in HiLi ^(-^i) and fix i G [k] arbitrarily. By the definition of 3', there is a 
unique string w in L' such that v G ^(w). Since v G ^'(Li), w should belong to Li. Since i is arbitrary, we 

t [Pumping Lemma for CFL] For any language L € CFL, there exists a constant (called a pumping constant) p such that, 
for every w £ L with |ui| > p, w can be decomposed as w = uvxyz such that \vxy\ < p, \vy\ > 1, and ui(') = uv^xy^z is in L 
for every i G N 
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conclude that w G flLi ^i- It thus foUows that v e C ^'(n*'^! Li). □ 

By viewmg wi, . . . , Wm as "different" symbols (Ti, . . . , cr„i as done in [5 , Lemma 16.31 makes it possible for 
us to exploit a similarity between "^{w) and '^{w). Therefore, the same type of argument developed for the 
proof of Lemma 15.21 can prove that L is indeed REG-dissectable. □ 

Next, we discuss the REG-dissectability of the difference of two bounded context-free languages. 

Proposition 6.5 The family BCFL2 is KEG-dissectable. 

Proof. Assume that Li ~ L2 is infinite. If L2 is finite, then the proposition is trivially true. Now, we 
assume that L2 is infinite. We claim the following statement. 

Claim 3 ^(ii) - I'(i2) ^ *(Li - L2). 

Proof. Let v £ 'P(ii) - ^(^2)- Note that v ^ *(i2). Since v G ^(ii), there exists a string w £ Li such 
that V £ ^'(w). If ui G L2, then we obtain v G '^'{w) C 'i'{L2), a contradiction against v ^ ^{L2)- Hence, 
w ^ L2 follows. Since w £ Li — L2, we obtain v G ^(w) C ^(ii — L2), as required. □ 

Similar to the proof of Theorem l6.4l the use of similarity between ^(w) and ^(w) helps apply an argument 
used for the proof of Proposition 15.51 to the REG-dissectability of ii — i2- ^ 

Finally, we extend the above result regarding BCFL2 to the entire Boolean hierarchy over BCFL, denoted 
BCFLbh, where BCFLbh is defined in a similar fashion to CFLbh- 

Theorem 6.6 The Boolean hierarchy BCFLbh is KEG-dissectable. 

Our starting point of the proof of the above theorem has already proven as in Proposition 16.51 because 
BCFL2 consists of the differences of two bounded context-free languages. 



Proof of Theorem 16.61 Since BCFL2fc-_i C BCFL2A: for every fc > 2, it is sufiicient to prove that BCFL2fc 
is REG-dissectable for every k > 1. We show this claim by induction on k. Notice that the basis case has 
been shown as in Proposition [6?5] Now, let fc > 2 and consider the family BCFL2fc. First, we show a simple 
fact regarding even levels of the Boolean hierarchy BCFLbh- 

Claim 4 For every index k>2, BCFLsfe = BCFL2fc_2 V BCFL2. 

Proof. Here, we want to claim that (*) for every index fc > 2, BCFL2fe_2 A co-BCFL — BCFL2/c_2. 
Let J" = BCFL2fc-2 A co-BCFL. Since BCFL2fe-2 = BCFLafc-a A co-BCFL by the definition, J" equals 
BCFL2fe_3 A (co-BCFL A co-BCFL), which is actually BCFL2fc_3 Aco-(BCFLVBCFL). Since BCFL is closed 
under union, we have BCFL V BCFL = BCFL. Hence, it follows that T = BCFL2fc_3 A co-BCFL; by the 
definition again, the right-hand side equals BCFL2fc_2. Therefore, Statement (*) holds. 

Next, by the definition, we have BCFL2/C = BCFL2fc_i A co-BCFL, which equals (BCFL2fe-2 V BCFL) A 
co-BCFL. By DeMorgan's law, it holds that BCFL2fc = (BCFL2fc_2 A co-BCFL) V (BCFL A co-BCFL). Using 
the equation (*), we obtain BCFL2/C = BCFL2fc_2 V BCFL2. □ 

By the induction hypothesis, BCFL2fc_2 is REG-dissectable. Since BCFL2 and BCFL2fc-2 are both REG- 
dissectable, Lemma l375l draws a conclusion that the family BCFL2A:-2 VBCFL2 is also REG-dissectable. By 
ClaimlH this family is exactly BCFL2fc. Therefore, BCFL2fc is REG-dissectable, as required for the induction. 
This completes the proof of Theorem 16.61 □ 



7 Application: Separation with Infinite Margins 

We seek an immediate application of our result regarding the REG-dissectability of languages. To describe 
this application, we introduce extra terminology. Given two infinite sets A and B, we say that A covers B 
with an infinite margin (or A is an i-cover of B, in short) ii B C A and A B. When A i-covers B, 
we briefly write {B^A) and call it an i-covering pair. A language C is said to separate {B,A) with infinite 
margins (or i-separate {B, A), in short) if (i) i? C C C A, (ii) A j^ae C, and (iii) B C For convenience, 
we use the notation (2?, C) for two language families C and V to denote the set of all i-covering pairs (B, A) 
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with A ^ C and B eV. We say that C i-separates {V, C) if, for every pair {B, A) e (2?, C), there exists a set 
C e C that i-separates (i?, A). 

As the starting point, by a direct construction of appropriate languages, we intend to show that CFL/n 
i-separates (CFL,CFL). 

Proposition 7.1 The language family CFL/n i-separates (CFL,CFL). 

Proof. Let {B, A) be any i-covering pair in (CFL, CFL). Since A — B is infinite, we can choose an infinite 
series S = {wi, W2, ■ ■ ■} ^ A — B of different lengths. Moreover, we demand that A ~ B <^ae S. Now, we 
define an advice function / as f{n) = 1" if n = \w\ for a certain string w G S, f{n) = 0" otherwise. Next, 
we make a dfa M behave as follows: on input x of length n with advice string /(n), first check if n > 
and f{n) = 1"; if this is indeed the case, M accepts the input; otherwise, it rejects the input. Let C be 
the set of all input strings that are accepted by M when the advice function / is given. Finally, we define 
C' = BU {An C), which belongs to CFL/n. It is not difficult to show that C" i-separates {B, A). □ 

Now, we want to apply the REG-dissection results of the previous sections to obtain several i-separation 
results. The following is a key lemma that bridges between REG-dissectability and i-separation. 

Lemma 7.2 Let C,T) be any two language families. Assume that C — V is KEG -diss eatable. It then holds 
that, for any A Cz C and any B G T), if A i-covers B , then there exists a language in C that i-separates 
{B,A), where C = {B U {An C) \ A e C, B e V,C e REG}. Hence, C i-separates {V,C). 

Proof. Let A ^ C and B ^ V he two infinite languages. Let D — A — B. Assume that D is infinite. 
Our assumption guarantees the existence of a language C G REG such that C dissects D. We define 
C ^ BLI{AnC). Moreover, since C dissects D, it follows that \{AnC) - B\ ^ oo and \{AnC) - B\ = oo. 
These imply that B C C" C ^ and |^ - C'\ = |C" - B\ ^ oo. Thus, C" i-separates {B,A). Since C £ REG, 
C" belongs to the family C. □ 

Concerning bounded context-free languages, we can show the following i-separation result. 

Theorem 7.3 For any k>l, BCFLfe i-separates (BCFLfc, BCFLfe). 

Proof. We want to show that BCFL^: — BCFL^ is REG-dissectable. Hence, by applying Lemma [7.21 we 
immediately obtain the theorem. For our purpose, we want to show that BCFLfc — BCFLfc is included in 
BCFLbh, because BCFLbh is REG-dissectable by Theorem 16.41 More strongly, we want to prove that (*) 
for any indices fc, m > 1, BCFLfc - BCFL,„ C BCFLbh- 

For simplicity, let Tk,m = BCFLfc-BCFL„ = BCFLfc Aco-BCFL^ and gfe,™ = BCFLfcABCFL^. We wiU 
show the above claim (*) by induction on {k, m) e N+ x N"*". For the case (1, 1), since J-i^i = BCFL2 holds by 
the definition, clearly Fi,i is a subset of BCFLbh- Moreover, for the case (2, 1), it holds that -F2,i ^ BCFL4 
as weU as ^^2,2 C BCFL4 because BCFL4 = (BCFL2 A C0-BCFL2) V (BCFL2 A BCFL2) = J"2,i V ^2,2- 

For a general case {k,m), it suffices to consider the case {2n, 2m + 1). Similar to Claim |4j we can prove 
the next useful relation. 

Claim 5 co-BCFL2fe+i = BCFL2fe_i VBCFL2. 

By Claims[l]and[ni -7^2n,2m+i equals (BCFL2„-2VBCFL2) A(co-BCFL2m-i VBCFL2), which can be trans- 
formed into J-2n-2.2m-i V J^2.2m-i V G2k-2,2 V ^2,2- By the induction hypothesis, there are two indices £i, £2 
such that J-2n-2,2m-i ^ BCFL2£i and J-2,2m-i ^ BCFL2^2- By applying Claim |4] repeatedly, we then ob- 
tain BCFL2£i Vfii BCFL2 and BCFL2£2 = Vfli BCFL2. Similarly, we obtain BCFL2fe_2 = Vjti^ BCFL2. 
Hence, e2fe-2,2 equals (V-Ji' BCFL2)ABCFL2 = Vti' ^2,2, which is included in \j''r^ BCFL4 = BCFL4(fc_i). 
Thus, we obtain ^/2fe-2,2 V C?2,2 Q BCFL4fc. It thus follows that J"2«.2m+i Q BCFL2fi VBCFL2£2 VBCFL4fc = 
yllj^^^'^^'^ BCFL2. As discussed before, this is equivalent to BCFL2(£^+f2+2fe) ; which is obviously included 
in BCFLbh- Therefore, we conclude that J^2n,2m+i Q BCFLbh- n 

Without a restriction onto bounded languages, we prove only the following i-separation result concerning 
CFL. 

Theorem 7.4 CFL i-separates (CFL, REG). 

We will give the proof of this theorem. To use Lemma 17.21 it is sufficient for us to observe the following 
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simple fact. 



Lemma 7.5 Assume that a language family C is closed under union with regular languages. The following 
three statements are logically equivalent. (1) REG ~C is KEG-dissectable. (2) co-C is KFiG-dissectable. (3) 
{S*} — C is HEG-dissectable. 

Proof. _(2) <^ (3). Trivial. (1) (3). Trivial. (3) (1). Consider A e REG and B e C. We define 
B' = BUA. By the property of C, B' belongs to C. Assume that a certain C e REG dissects S* - B'. This 
impUes that |Cn (E* - B')\ ^ \C D (E* - B')\ ^ oo. Hence, we have \C n {A - B)\ ^ \C n (A - B)\ = oo. 
We thus conclude (1). □ 

Finally, we present the proof of the desired i-separation result. 

Proof of Theorem 17.41 By Proposition 16.11 we obtain the REG-dissectability of co-CFL. By Lemma 
17.51 this means that REG — CFL is REG-dissectable. By Lemma 17.21 we can conclude that GEL i-separates 
(GEL, REG). □ 



8 Discussions and Open Problems 

We have initiated a fundamental study on the regular languages' power of dissecting given infinite languages. 
Although we have developed several proof techniques and proven several basic results, unfortunately, we have 
left unsolved a number of intriguing questions. Eor instance, we have shown the REG-dissectability of BGFL/j 
and BGFL(fc) for each index k > 2; however, we have not answered the following key question. 

Open Problem 8.1 Are GELfe and GFL(fc) KYjG-dissectahle for any k>2? 

When we move our attention from GFL to two other language families, 1-G=LIN and 1-PLIN, which were 
introduced in [11_ as natural analogues of G=P and PP, respectively, in computational complexity theory, 
we have no answer to the following question. 

Open Problem 8.2 Are 1-G=LIN and 1-PLIN R¥jG-dissectable? 

Goncerning the i-separation of (GFL, GFL), the following question has still awaited its answer. 
Open Problem 8.3 Does GFL i-separate (GFL, GEL)? 



Acknowledgments. The authors are grateful to Jeffrey Shallit for having drawn the authors' attention to 
[3] whose core concept had helped formulate an initial notion of "dissectability." 
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